Marketing analysis


I - General introduction

Origin The present dataframe has been created for marketing analysis purposes. It assembles various personal information about 2239 customers, such as their education level, income, age, marital status, number of children at home…

It also shows their consuming habits (amount spent on wine, on sweets…) and the number of purchases made on discounted products.

There is very few context concerning this dataframe, since the source is unknown. It is not clear when these informations were registered, but probably by 2014 since the date of customers’ enrollment within the company doesn’t go further than 2014.

Aims To predict the customer’s behavior (Number of purchases made with a discount) depending on the most significant personal attributes To categorize participants in a few typical profiles (probably with PCA)

Attributes

  • People

ID: Customer’s unique identifier Year_Birth: Customer’s birth year Education: Customer’s education level Marital_Status: Customer’s marital status Income: Customer’s yearly household income Kidhome: Number of children in customer’s household Teenhome: Number of teenagers in customer’s household Dt_Customer: Date of customer’s enrollment with the company Recency: Number of days since customer’s last purchase Complain: 1 if customer complained in the last 2 years, 0 otherwise

  • Products

MntWines: Amount spent on wine in last 2 years MntFruits: Amount spent on fruits in last 2 years MntMeatProducts: Amount spent on meat in last 2 years MntFishProducts: Amount spent on fish in last 2 years MntSweetProducts: Amount spent on sweets in last 2 years MntGoldProds: Amount spent on gold in last 2 years

  • Promotions

NumDealsPurchases: Number of purchases made with a discount AcceptedCmp1: 1 if customer accepted the offer in the 1st campaign, 0 otherwise AcceptedCmp2: 1 if customer accepted the offer in the 2nd campaign, 0 otherwise AcceptedCmp3: 1 if customer accepted the offer in the 3rd campaign, 0 otherwise AcceptedCmp4: 1 if customer accepted the offer in the 4th campaign, 0 otherwise AcceptedCmp5: 1 if customer accepted the offer in the 5th campaign, 0 otherwise Response: 1 if customer accepted the offer in the last campaign, 0 otherwise NumStorePurchases: Number of purchases made directly in stores

Before loading the dataset, we want to make sure we have all the necessary packages installed and loaded, and that the code can be run by anybody.

if(!require(pacman)) {
  install.packages("pacman")
  library(pacman)
}
suppressPackageStartupMessages(pacman::p_load(tidyverse, gtsummary, ggpubr, moments, here, sjPlot, parameters, effectsize, pander))

path =  here("JULIETTE") 
setwd(path)
data <- read.table("marketing_campaign.csv", header=T, sep="\t")

II - Data overview and cleaning

Structure

str(data)
## 'data.frame':    2240 obs. of  29 variables:
##  $ ID                 : int  5524 2174 4141 6182 5324 7446 965 6177 4855 5899 ...
##  $ Year_Birth         : int  1957 1954 1965 1984 1981 1967 1971 1985 1974 1950 ...
##  $ Education          : chr  "Graduation" "Graduation" "Graduation" "Graduation" ...
##  $ Marital_Status     : chr  "Single" "Single" "Together" "Together" ...
##  $ Income             : int  58138 46344 71613 26646 58293 62513 55635 33454 30351 5648 ...
##  $ Kidhome            : int  0 1 0 1 1 0 0 1 1 1 ...
##  $ Teenhome           : int  0 1 0 0 0 1 1 0 0 1 ...
##  $ Dt_Customer        : chr  "04-09-2012" "08-03-2014" "21-08-2013" "10-02-2014" ...
##  $ Recency            : int  58 38 26 26 94 16 34 32 19 68 ...
##  $ MntWines           : int  635 11 426 11 173 520 235 76 14 28 ...
##  $ MntFruits          : int  88 1 49 4 43 42 65 10 0 0 ...
##  $ MntMeatProducts    : int  546 6 127 20 118 98 164 56 24 6 ...
##  $ MntFishProducts    : int  172 2 111 10 46 0 50 3 3 1 ...
##  $ MntSweetProducts   : int  88 1 21 3 27 42 49 1 3 1 ...
##  $ MntGoldProds       : int  88 6 42 5 15 14 27 23 2 13 ...
##  $ NumDealsPurchases  : int  3 2 1 2 5 2 4 2 1 1 ...
##  $ NumWebPurchases    : int  8 1 8 2 5 6 7 4 3 1 ...
##  $ NumCatalogPurchases: int  10 1 2 0 3 4 3 0 0 0 ...
##  $ NumStorePurchases  : int  4 2 10 4 6 10 7 4 2 0 ...
##  $ NumWebVisitsMonth  : int  7 5 4 6 5 6 6 8 9 20 ...
##  $ AcceptedCmp3       : int  0 0 0 0 0 0 0 0 0 1 ...
##  $ AcceptedCmp4       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ AcceptedCmp5       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ AcceptedCmp1       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ AcceptedCmp2       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Complain           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_CostContact      : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ Z_Revenue          : int  11 11 11 11 11 11 11 11 11 11 ...
##  $ Response           : int  1 0 0 0 0 0 0 0 1 0 ...

Summary

pander(summary(data))
Table continues below
ID Year_Birth Education Marital_Status
Min. : 0 Min. :1893 Length:2240 Length:2240
1st Qu.: 2828 1st Qu.:1959 Class :character Class :character
Median : 5458 Median :1970 Mode :character Mode :character
Mean : 5592 Mean :1969 NA NA
3rd Qu.: 8428 3rd Qu.:1977 NA NA
Max. :11191 Max. :1996 NA NA
NA NA NA NA
Table continues below
Income Kidhome Teenhome Dt_Customer
Min. : 1730 Min. :0.0000 Min. :0.0000 Length:2240
1st Qu.: 35303 1st Qu.:0.0000 1st Qu.:0.0000 Class :character
Median : 51382 Median :0.0000 Median :0.0000 Mode :character
Mean : 52247 Mean :0.4442 Mean :0.5062 NA
3rd Qu.: 68522 3rd Qu.:1.0000 3rd Qu.:1.0000 NA
Max. :666666 Max. :2.0000 Max. :2.0000 NA
NA’s :24 NA NA NA
Table continues below
Recency MntWines MntFruits MntMeatProducts
Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
1st Qu.:24.00 1st Qu.: 23.75 1st Qu.: 1.0 1st Qu.: 16.0
Median :49.00 Median : 173.50 Median : 8.0 Median : 67.0
Mean :49.11 Mean : 303.94 Mean : 26.3 Mean : 166.9
3rd Qu.:74.00 3rd Qu.: 504.25 3rd Qu.: 33.0 3rd Qu.: 232.0
Max. :99.00 Max. :1493.00 Max. :199.0 Max. :1725.0
NA NA NA NA
Table continues below
MntFishProducts MntSweetProducts MntGoldProds NumDealsPurchases
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.000
1st Qu.: 3.00 1st Qu.: 1.00 1st Qu.: 9.00 1st Qu.: 1.000
Median : 12.00 Median : 8.00 Median : 24.00 Median : 2.000
Mean : 37.53 Mean : 27.06 Mean : 44.02 Mean : 2.325
3rd Qu.: 50.00 3rd Qu.: 33.00 3rd Qu.: 56.00 3rd Qu.: 3.000
Max. :259.00 Max. :263.00 Max. :362.00 Max. :15.000
NA NA NA NA
Table continues below
NumWebPurchases NumCatalogPurchases NumStorePurchases NumWebVisitsMonth
Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.000
1st Qu.: 2.000 1st Qu.: 0.000 1st Qu.: 3.00 1st Qu.: 3.000
Median : 4.000 Median : 2.000 Median : 5.00 Median : 6.000
Mean : 4.085 Mean : 2.662 Mean : 5.79 Mean : 5.317
3rd Qu.: 6.000 3rd Qu.: 4.000 3rd Qu.: 8.00 3rd Qu.: 7.000
Max. :27.000 Max. :28.000 Max. :13.00 Max. :20.000
NA NA NA NA
Table continues below
AcceptedCmp3 AcceptedCmp4 AcceptedCmp5 AcceptedCmp1
Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
Mean :0.07277 Mean :0.07455 Mean :0.07277 Mean :0.06429
3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
NA NA NA NA
AcceptedCmp2 Complain Z_CostContact Z_Revenue Response
Min. :0.00000 Min. :0.000000 Min. :3 Min. :11 Min. :0.0000
1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:3 1st Qu.:11 1st Qu.:0.0000
Median :0.00000 Median :0.000000 Median :3 Median :11 Median :0.0000
Mean :0.01339 Mean :0.009375 Mean :3 Mean :11 Mean :0.1491
3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:3 3rd Qu.:11 3rd Qu.:0.0000
Max. :1.00000 Max. :1.000000 Max. :3 Max. :11 Max. :1.0000
NA NA NA NA NA

Densities

data %>% 
  keep(is.numeric) %>%
  gather() %>%
  ggplot(aes(value)) +
  facet_wrap(~key, scales = "free") + 
  geom_histogram()

We are only interested in the total number of promotions accepted by the customers, since we don’t have details about the nature of each promotion.

data$AcceptedCmpTotal <- data$AcceptedCmp1 + data$AcceptedCmp2 + data$AcceptedCmp3 + data$AcceptedCmp4 + data$AcceptedCmp5 + data$Response 

The dataframe contains many variables, some are superfluous for our analysis (web visits and purchases, cumplains, catalog purchases, Z_Revenus and Z_CostContact which we don’t have information about)

data$Complain <- data$NumWebVisitsMonth <- data$NumWebPurchases <- data$NumCatalogPurchases <- data$Z_Revenue <- data$Z_CostContact <- data$AcceptedCmp1 <- data$AcceptedCmp2 <- data$AcceptedCmp3 <- data$AcceptedCmp4 <- data$AcceptedCmp5 <- data$Response <- NULL

We want to calculate the age of the customers. If we proceed with : “2021 - data$Year_Birth”, we would get their current age. It makes more sense to get their age at the moment the data was registered, so we proceed with 2014 minus dataYear_Birth, although here we are only assuming that it was indeed registered in 2014.

data$age <- 2014 - data$Year_Birth 
plot(data$age)

We see 3 outliers who seems to be older than 110 years old. The corresponding birth years are 1893, 1900 and 1899. The first one could be corrected by 1993, the second one would be due to 2 typing errors which is improbable, and the third could be replaced by 1999 but it corresponds to someone who has a PhD education level, which is unlikely at age 15. Since the dataset is very big, we can choose to delete these lines.

# which(data$age>110)
data <- data[-c(193, 240, 340),]

Marital status can be simplified in only a few levels, and transformed into a factor. Since the “other” section represents less than 1% of the participants, it is not enough to model it as a factor.

We then transform some relevant variables into factors.

data$Marital_Status <-  factor(data$Marital_Status, labels = c("Other", "Single", "Single", "Married", "Single", "Together", "Single", "Other"))
data$Marital_Status[data$Marital_Status=="Other"] <- NA; data$Marital_Status = droplevels(data$Marital_Status)
data$ID <- factor(data$ID) 
data$Education <- factor(data$Education) 
data$Teenhome <-  factor(data$Teenhome)
data$AcceptedCmpTotal <-  factor(data$AcceptedCmpTotal)
data$Kidhome <-  factor(data$Kidhome, labels = c("no", "yes", "yes"))

For Kidhome, we fused the answers “1” and “2” because there are only 2% of “2” which is not enough information to model it a one separate factor.

We now want to plot all the variables again, and check again whether anything is abnormal.

data %>% 
  keep(is.numeric) %>%
  gather() %>%
  ggplot(aes(value)) +
  facet_wrap(~key, scales = "free") + 
  geom_histogram()

III - Hypotheses

Here we will be eyeballing the relationship between various variables, and determine those that seem the most relevant for further analysis.

data %>% 
      filter(!is.na(Marital_Status)) %>% 
      ggplot(aes(MntWines, Marital_Status)) + geom_boxplot(na.rm = TRUE)

ggplot(data, aes(Kidhome, MntSweetProducts))+ geom_boxplot(outlier.colour =  "red") +  geom_point(position = position_jitter()) 

ggplot(data, aes(x=Income, y=NumStorePurchases)) +
  geom_violin(trim=FALSE, fill='#A4A4A4', color="darkred")+
  theme_minimal()

ggplot(data, aes(x=MntWines, y=Marital_Status)) +
  geom_violin(trim=FALSE, fill='#A4A4A4', color="darkred")+
  geom_boxplot(width=0.05) + theme_minimal()

NumStorePurchases seems to be the most useful variable to analyse since marketing analysis might want to determine which profils buy the most in the store. In order to consider NumStorePurchases as a response variables for a linear model, we first have to check normality.

ggplot(data, aes(x = NumStorePurchases)) +
  geom_histogram(aes(y = ..density..),
                 colour = 1, fill = "white") +
  geom_density(adjust = 1) + labs(title = "Purchases",
 caption = paste("skewness =", round(moments::skewness(data$NumStorePurchases, na.rm = TRUE),2)))


# put that into appendix!

# ggplot(data, aes(x = sqrt(NumStorePurchases))) +
#   geom_histogram(aes(y = ..density..),
#                  colour = 1, fill = "white") +
#   geom_density(adjust = 1) + labs(title = "Purchases",
#  caption = paste("skewness =", round(moments::skewness(data$NumStorePurchases, na.rm = TRUE),2)))

IV - Modelling

A. LINEAR MODEL

A.1. Creating the model and assessing assumptions

m1 <- lm(data=data, NumStorePurchases ~ Kidhome*Income*Education*age) 
plot(m1, c(1:2,4), ask=F)

ggplot(data, aes(NumStorePurchases, Income)) + geom_point(color="blue", alpha=0.3, position = position_jitter())

# plot(Income ~NumStorePurchases, col="lightblue", pch=19, cex=2,data)
# text(Income ~NumStorePurchases, labels=ID,data, cex=0.9, font=1)

Here we observe peculiar outliers for “Income”. One income is equal to 666’666, and when income is higher than 150’000

newdf = data %>%
  filter(!ID %in% c(9432, 5555, 4619, 5336, 1501, 1503, 8475, 4931, 11181) ) 
# plot(Income ~NumStorePurchases, col="lightblue", pch=19, cex=2,data=newdf) # odnt need that anbynmore

A.2. Reassessing without the outliers

m1 <- lm(data=newdf, NumStorePurchases ~ Kidhome*Income*Education*age)
plot(m1, c(1:2,4), ask=F)

A.3. Model selection

Here we use the stepAIC function to select the model that has the best AIC.

ms <- MASS::stepAIC(m1, direction = "both", trace = FALSE) #il choisit le meilleur AIC
ms$anova
## Stepwise Model Path 
## Analysis of Deviance Table
## 
## Initial Model:
## NumStorePurchases ~ Kidhome * Income * Education * age
## 
## Final Model:
## NumStorePurchases ~ Kidhome + Income + Education + age + Kidhome:Income + 
##     Kidhome:Education + Income:Education + Kidhome:age + Income:age + 
##     Education:age + Kidhome:Income:Education + Income:Education:age
## 
## 
##                             Step Df  Deviance Resid. Df Resid. Dev      AIC
## 1                                                  2164   11138.08 3650.691
## 2 - Kidhome:Income:Education:age  4 26.205697      2168   11164.28 3647.870
## 3        - Kidhome:Education:age  4 14.097325      2172   11178.38 3642.652
## 4           - Kidhome:Income:age  1  8.240739      2173   11186.62 3642.276

A.4. Computing the final model

finalm1 <- lm(data=newdf,NumStorePurchases ~ Kidhome + Income + Education + age + Kidhome:Income + Kidhome:Education + Income:Education + Kidhome:age + Income:age + Education:age + Kidhome:Income:Education + Income:Education:age)

A.5. Statisical inference

We first use the eta_squared function to compute the effect sizes: everything that has 0.00 on the left of the 90% CI column has a “meaningless” effect size.

parameters::model_parameters(anova(finalm1))
## Parameter                | Sum_Squares |   df | Mean_Square |       F |      p
## ------------------------------------------------------------------------------
## Kidhome                  |     6242.08 |    1 |     6242.08 | 1212.52 | < .001
## Income                   |     5446.16 |    1 |     5446.16 | 1057.92 | < .001
## Education                |       10.29 |    4 |        2.57 |    0.50 | 0.736 
## age                      |       24.65 |    1 |       24.65 |    4.79 | 0.029 
## Kidhome:Income           |        4.87 |    1 |        4.87 |    0.95 | 0.331 
## Kidhome:Education        |       31.63 |    4 |        7.91 |    1.54 | 0.189 
## Income:Education         |       26.11 |    4 |        6.53 |    1.27 | 0.280 
## Kidhome:age              |       36.69 |    1 |       36.69 |    7.13 | 0.008 
## Income:age               |        5.97 |    1 |        5.97 |    1.16 | 0.282 
## Education:age            |       11.41 |    4 |        2.85 |    0.55 | 0.696 
## Kidhome:Income:Education |       52.82 |    4 |       13.20 |    2.56 | 0.037 
## Income:Education:age     |       49.16 |    4 |       12.29 |    2.39 | 0.049 
## Residuals                |    11186.62 | 2173 |        5.15 |         |       
## 
## Anova Table (Type 1 tests)
effectsize::eta_squared(car::Anova(finalm1, type = 2), ci = 0.9, alternative = "two") #modified this a bit
## # Effect Size for ANOVA (Type II)
## 
## Parameter                | Eta2 (partial) |       90% CI
## --------------------------------------------------------
## Kidhome                  |           0.05 | [0.04, 0.07]
## Income                   |           0.31 | [0.29, 0.34]
## Education                |       4.56e-04 | [0.00, 0.00]
## age                      |       2.42e-03 | [0.00, 0.01]
## Kidhome:Income           |       1.20e-03 | [0.00, 0.00]
## Kidhome:Education        |       2.86e-03 | [0.00, 0.01]
## Income:Education         |       2.28e-03 | [0.00, 0.01]
## Kidhome:age              |       3.98e-03 | [0.00, 0.01]
## Income:age               |       7.07e-04 | [0.00, 0.00]
## Education:age            |       1.07e-03 | [0.00, 0.00]
## Kidhome:Income:Education |       5.30e-03 | [0.00, 0.01]
## Income:Education:age     |       4.38e-03 | [0.00, 0.01]

A.6. Plots and summary estimates

Plot estimates

sjPlot::plot_model(finalm1) 

Plot marginal effects

sjPlot::plot_model(finalm1, type = "pred", terms = "Kidhome", show.data  = T, jitter = 1) 
sjPlot::plot_model(finalm1, type = "pred", terms = "Income", show.data  = T, jitter = 1) 
sjPlot::plot_model(finalm1, type = "pred", terms = c("Income", "Kidhome"), show.data  = T, jitter = 1) 
sjPlot::plot_model(finalm1, type = "pred", terms = c("Income", "Education"), show.data  = T, jitter = 1) 
sjPlot::plot_model(finalm1, type = "pred", terms = c("Income", "age"), show.data  = T, jitter = 1) 
sjPlot::plot_model(finalm1, type = "pred", terms = c("age", "Kidhome"), show.data  = T, jitter = 1) 

Summary table of standardized estimates

sjPlot::tab_model(effectsize::standardize(finalm1), rm.terms = c("*Education.Q", "Education^4", "Income:Education.C", "Education.Q" , "Kidhomeyes:Education.Q", "Income:Education.Q", "Kidhomeyes:Income:Education.Q", "Education.C" ,              "Kidhomeyes:Education.C"  ,      "Income:Education.C"  ,"Kidhomeyes:Income:Education.C", "Education^4",  "Kidhomeyes:Education^4", "Income:Education^4" , "Kidhomeyes:Income:Education^4"), show.intercept = F)  #what is this for? # this to show all the STANDARDIZED estimates on a numeric form in a table summarizing all thE information
  Num Store Purchases
Predictors Estimates CI p
Kidhome [yes] -0.58 -0.83 – -0.32 <0.001
Income 0.61 0.48 – 0.74 <0.001
Education [Basic] -0.62 -2.03 – 0.79 0.391
Education [Graduation] -0.07 -0.23 – 0.08 0.356
Education [Master] -0.11 -0.29 – 0.08 0.254
Education [PhD] 0.04 -0.14 – 0.22 0.663
age 0.03 -0.09 – 0.14 0.668
Kidhome [yes] * Income -0.11 -0.35 – 0.13 0.383
Kidhome [yes] * Education
[Basic]
-0.76 -3.00 – 1.47 0.503
Kidhome [yes] * Education
[Graduation]
0.17 -0.10 – 0.45 0.214
Kidhome [yes] * Education
[Master]
0.30 -0.01 – 0.61 0.061
Kidhome [yes] * Education
[PhD]
0.09 -0.22 – 0.40 0.555
Income * Education
[Basic]
-0.32 -1.28 – 0.65 0.522
Income * Education
[Graduation]
-0.03 -0.18 – 0.12 0.684
Income * Education
[Master]
0.01 -0.16 – 0.18 0.894
Income * Education [PhD] -0.17 -0.34 – -0.00 0.048
Kidhome [yes] * age -0.12 -0.20 – -0.04 0.003
Income * age 0.03 -0.07 – 0.13 0.570
Education [Basic] * age -0.66 -1.79 – 0.47 0.253
Education [Graduation] *
age
-0.03 -0.15 – 0.09 0.644
Education [Master] * age 0.02 -0.12 – 0.16 0.761
Education [PhD] * age -0.06 -0.20 – 0.07 0.342
(Kidhome [yes] * Income)
* Education [Basic]
-0.63 -2.07 – 0.81 0.392
(Kidhome [yes] * Income)
* Education [Graduation]
0.12 -0.14 – 0.38 0.374
(Kidhome [yes] * Income)
* Education [Master]
0.23 -0.08 – 0.54 0.153
(Kidhome [yes] * Income)
* Education [PhD]
0.44 0.13 – 0.76 0.005
(Income * Education
[Basic]) * age
-0.47 -1.19 – 0.26 0.209
(Income * Education
[Graduation]) * age
-0.09 -0.20 – 0.02 0.113
(Income * Education
[Master]) * age
-0.04 -0.16 – 0.09 0.574
(Income * Education
[PhD]) * age
0.02 -0.10 – 0.15 0.716
Observations 2204
R2 / R2 adjusted 0.516 / 0.510

B. PRINCIPAL COMPONENT ANALYSIS

clean_data <-data[rowSums(is.na(data))==0, ]
pm1<-prcomp(clean_data[,-c(1,3,4,6,7,8,18)], scale=TRUE)
# summary(pm1)

data2 <- cbind(clean_data, pm1$x) #double check!
# str(data2)
sjPlot::tab_pca(pm1, show.cronb = F, show.var = T, nmbr.fctr = 8)
Principal Component Analysis
  Component 1 Component 2 Component 3 Component 4 Component 5 Component 6 Component 7 Component 8
Year_Birth 0.00 -1.00 0.02 -0.01 0.05 -0.02 0.06 -0.00
Income -0.14 0.12 0.05 -0.02 -0.81 0.10 -0.29 0.20
Recency 0.00 0.01 -0.00 1.00 -0.00 0.01 -0.00 0.01
MntWines -0.15 0.09 0.00 0.02 -0.52 0.18 -0.70 0.01
MntFruits -0.86 -0.01 0.06 -0.01 -0.15 0.11 -0.23 0.12
MntMeatProducts -0.52 -0.02 0.04 0.03 -0.71 0.09 -0.17 0.10
MntFishProducts -0.71 0.02 0.08 -0.01 -0.26 0.22 -0.13 0.31
MntSweetProducts -0.40 -0.01 0.05 0.02 -0.24 0.12 -0.19 0.83
MntGoldProds -0.23 0.03 -0.04 0.01 -0.15 0.93 -0.18 0.10
NumDealsPurchases 0.09 0.04 -0.99 0.00 0.05 0.04 -0.05 -0.04
NumStorePurchases -0.28 0.08 -0.07 -0.01 -0.19 0.12 -0.85 0.20
age -0.00 1.00 -0.02 0.01 -0.05 0.02 -0.06 0.00
Proportion of Variance 36.63 % 16.90 % 9.14 % 8.35 % 6.83 % 5.44 % 4.09 % 3.57 %
Cumulative Proportion 36.63 % 53.53 % 62.67 % 71.01 % 77.84 % 83.28 % 87.36 % 90.94 %
varimax-rotation

We need 8 principal components to attain 0,90